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Abstract 

Manipulation  of  multimedia  data  is  not  straightforward  as  in  conventional  databases.  One 
main  problem  is  the  retrieval  of  multimedia  data  from  the  database  with  the  need  to  match  the 
contents  of  multimedia  data  to  a  user  query.  In  order  to  achieve  a  content  based  retrieval  in 
our  approach,  we  use  natural  language  captions  which  allow  the  user  to  describe  the  contents 
of  multimedia  data.  In  a  similar  manner,  users  will  specify  their  queries  on  multimedia  data 
contents  in  natural  language  form.  A  problem  is  that  different  or  even  the  same  user  describe 
the  same  thing  differently  at  different  times  which  results  in  the  descriptions  of  the  contents 
of  multimedia  data  to  rarely  exactly  match  the  descriptions  of  the  user  queries.  Hence,  partial 
or  approximate  match  between  descriptions  of  multimedia  data  and  user  queries  is  generally 
required  during  multimedia  data  retrieval.  We  propose  an  intelligent  approach  to  approximate 
match  by  integrating  both  object-oriented  and  natural  language  understanding  techniques.  In 
order  to  make  the  query  specification  process  easier  we  also  develop  a  graphical  user  interface 
supporting  incremental  query  specification  and  a  natural  way  of  expressing  joins.  The  Multi- 
media  Database  Management  System  (MDBMS)  described  in  this  paper  incorporates  the  ca¬ 
pabilities  as  mentioned  above. 
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1.  Introduction 

A  multimedia  database  management  system  supports  the  management  of  multimedia  data, 
which  includes  image  and  sound  among  others,  in  addition  to  supporting  conventional  databases. 
Multimedia  systems  are  currently  gaining  a  lot  of  attention  because  technology  today  has  made  it 
possible  to  capture  and  store  multimedia  data  in  computers.  Multimedia  data  broadens  the  commu¬ 
nication  between  the  computer  system  and  the  user.  Many  applications  like  military,  publishing,  or 
instructional  routinely  need  multimedia  data.  Although  the  cost  of  the  hardware  required  to  handle 
multimedia  data  is  decreasing  rapidly,  the  software  needed  to  manage  such  multimedia  data  is  lack¬ 
ing  or  does  not  match  the  needs. 

In  this  paper  we  present  a  Multimedia  Database  Management  System  (MDBMS).  The  system 
allows  sophisticated  handling  of  multimedia  data  featuring  an  intelligent  data  retrieval  as  well  as  a 
graphical  interface  for  user  interaction.  Besides  describing  the  overall  system  architecture  the  im¬ 
portant  parts  of  the  system  such  as  Parser,  Matcher  and  Graphical  User  Interface  will  be  presented 
in  more  detail. 

One  important  achievement  of  the  MDBMS  system  is  the  efficient  method  for  the  retrieval  of 
multimedia  data  by  way  of  inexact  matching.  In  conventional  databases,  retrieval  of  standard  nu¬ 
merical  and  alphanumeric  data  is  handled  by  utilizing  the  content  of  the  data.  The  fundamental 
problem  that  one  must  face  in  the  context  of  a  multimedia  database  is  the  question  of  how  to  handle 
content  search.  There  is  no  easy  solution.  It  is  difficult  to  find  the  appropriate  data  conveniently  and 
efficiently  based  on  the  contents  of  the  multimedia  data  because  they  are  intrinsically  rich  in  se¬ 
mantics.  In  developing  an  efficient  retrieval  method  for  multimedia  data,  we  concluded  that  it  is 
not  possible  to  utilize  the  content  directly  with  today’s  technology.  This  is  a  fair  conclusion  since 
the  content  of  a  multimedia  data  is  mostly  unstructured  complex  data  like  an  image  or  a  sound. 

In  our  MDBMS  system  we  use  the  approach  of  content  based  search  by  means  of  verbal  de¬ 
scriptions  on  the  contents  of  multimedia  data.  We  argue  that  the  well  known  keyword  approach  to 
content  description  is  not  suitable  because  it  has  been  known  to  be  imprecise  and  the  users  often 
have  difficulty  in  focusing  the  search  to  data  of  interest.  Hence,  we  adopt  the  natural  language  ap¬ 
proach  to  content  description  as  a  more  viable  option.  Since  full  understanding  of  natural  language 
is  not  yet  achievable,  we  use  a  caption  based  approach  to  express  the  description  of  media  data.  In 
order  to  achieve  an  automatic  interpretation  of  captions  we  exploit  techniques  used  in  natural  lan¬ 
guage  understanding  and  artificial  intelligence. 

The  methodology  we  adopt  consists  of  associating  natural  language  captions  to  each  multime¬ 
dia  data  and  using  the  description  to  retrieve  the  relevant  data.  More  precisely,  the  description  of  a 
multimedia  data  is  matched  against  the  description  of  a  user  query  which  is  also  expressed  using 
natural  language  captions.  The  major  problem  with  this  approach  is  that  it  is  generally  the  case  that 
the  description  of  a  multimedia  data  does  not  exactly  match  the  description  of  a  user  query.  The 
reason  is  that  it  is  difficult  for  different  users  or  even  the  same  user  at  different  times  to  describe 
the  same  thing  identically  because  they  can  use  synonyms  or  generalize/specialize  categories  be- 
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longing  to  the  domain  of  interest  and  so  on.  Hence,  the  key  to  efficient  retrieval  process  is  to  auto¬ 
matically  perform  partial  or  approximate  match  of  the  description  of  a  multimedia  data  to  the 
description  of  a  user  query  whenever  exact  match  is  not  possible.  In  this  paper,  we  propose  an  in¬ 
telligent  approach  to  approximate  matching  by  integrating  object-oriented  and  natural  language 
understanding  techniques. 

The  second  issue  addressed  in  this  paper  is  new  ways  of  interaction  with  the  user.  The  user  in¬ 
terface  is  an  important  part  which  strongly  determines  the  effectiveness  in  using  a  system.  In  order 
to  achieve  a  natural  way  of  interacting  with  the  MDBMS  system  we  are  developing  a  graphical  user 
interface  which  makes  the  query  specification  easier  compared  to  query  languages  like  SQL.  We 
found  that  in  order  to  formulate  complex  queries  a  user  partition  it  into  smaller  pieces  and  put  them 
together  in  a  later  stage.  This  behavior  is  reflected  in  the  principle  of  incremental  query  specifica¬ 
tion  which  is  supported  by  our  Graphical  User  Interface.  In  addition,  we  observed  that,  for  a  given 
database,  the  joins  necessary  to  specify  most  of  the  queries  correspond  directly  to  natural  language 
expressions.  This  leads  to  the  principle  of  natural  expression  of  joins  also  supported  by  our  Graph¬ 
ical  User  Interface.  Both  principles  are  generally  of  use  not  only  for  multimedia  systems  or  graph¬ 
ical  user  interfaces  but  for  any  database  query  interface. 

This  paper  makes  three  contributions.  The  first  contribution  is  that  context  description  of  mul¬ 
timedia  data  is  possible  using  natural  language  captions  which  can  be  interpreted  automatically  us¬ 
ing  domain  dependent  knowledge.  Another  contribution  is  the  formulation  of  a  general  scheme  to 
retrieve  data  that  comprises  a  variety  of  multimedia  data  stored  in  a  database  with  special  emphasis 
on  approximate  match.  As  far  as  we  know,  very  little  research  on  partial  or  approximate  matching, 
especially  in  the  natural  language  applications,  has  been  conducted  in  natural  language  processing. 
The  retrieval  method  may  also  be  easily  adopted  into  the  field  of  intelligent  information  retrieval. 
Hereby  we  support  the  claim  that  object-oriented  technology  can  be  adopted  and  easily  applied  to 
multimedia  systems  application.  The  third  contribution  is  the  identification  and  application  of  two 
principles  in  the  construction  of  a  graphical  user  interface  that  help  to  make  the  query  specification 
process  easier. 

The  paper  is  organized  as  follows:  Section  2  discusses  related  work.  Section  3  addresses  fun¬ 
damental  problems  and  outlines  the  architecture  of  the  Multimedia  Database  Management  System 
(MDBMS).  Section  4  describes  the  natural  language  interpretation  capabilities  of  the  parser.  Sec¬ 
tion  5  describes  our  approximate  match  algorithm  used  for  the  retrieval  of  multimedia  data  and 
Section  6  gives  a  short  overview  of  the  user  interface.  Finally,  Section  7  gives  the  summary. 

2.  Related  Work 

Several  multimedia  projects  have  been  undertaken  by  various  researchers  in  both  academia  and 
industry  over  the  past  several  years.  The  MINOS  system  [CHRIS86]  developed  by  a  team  at  the 
University  of  Toronto  manages  highly  structured  multimedia  objects  that  consist  of  attributes  as 
well  as  the  text,  image  and  voice  part.  Sophisticated  browsing  and  user  interface  features  allow 
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browsing  of  the  schema  as  well  as  synchronized  updates.  The  MCC  Database  program 
[WOEL86.87]  also  undertook  several  multimedia  projects  by  establishing  the  database  require¬ 
ments  of  multimedia  applications.  They  identified  requirements  for  a  data  model  and  for  the  shar¬ 
ing  and  manipulation  of  multimedia  data.  Hypertext  has  also  been  extended  to  manage  image  and 
sound  as  well.  One  notable  outcome  is  the  INTERMEDIA  system  [YANK88]  developed  at  Brown 
University.  [MASU87]  has  developed  a  framework  to  classify  and  compare  the  different  projects. 

The  user  interface  is  an  important  part  of  a  database  system  especially  when  dealing  with  mul¬ 
timedia  data  because  of  their  non-textual  nature.  Most  of  the  research  in  the  area  of  user  interfaces 
focus  on  the  entity-relationship  [WONG82,  FOGG84,  ROGE88]  or  the  more  complex  sematic  and 
object-oriented  data  model  [KING84,  GOLD85,  BRYC86,  AGRA90]  allowing  queries  to  be  di¬ 
rectly  specified  within  the  schema.  In  contrast  we  use  an  extension  of  the  relational  model  to  handle 
and  manipulate  the  media  data.  In  order  to  allow  an  easy  query  specification  we  provide  a  graphical 
user  interface  which  incorporates  incremental  query  specification  and  a  natural  way  of  expressing 
joins,  differing  in  many  ways  from  the  well  know  OBE  interface  [ZL0077]. 

Another  important  aspect  of  a  multimedia  database  system  is  the  content  retrieval  of  media 
data.  The  fundamental  difficulty  in  the  retrieval  of  multimedia  data  lies  in  the  problem  of  handling 
the  rich  semantics  that  is  contained  in  the  data.  In  [LUM89],  we  introduced  the  approach  of  con¬ 
tents  based  search  by  means  of  natural  language  descriptions  that  form  a  part  of  a  multimedia  data. 
This  approach  is  related  to  the  research  on  artificial  intelligence  (AI)  and  information  retrieval  (IR). 
In  the  area  of  AI  a  variety  of  methods  have  been  developed  for  the  processing  of  natural  language. 
Although  the  problem  of  full  understanding  of  natural  language  has  not  yet  been  solved  satisfacto¬ 
ry,  powerful  tools  for  parsing  and  interpretation  of  natural  language  have  been  developed. 
[GROSZ87]  exemplifies  the  current  state  of  the  art.  Most  of  the  work  done  focus  on  complete  un¬ 
derstanding  of  natural  language  requiring  extensive  knowledge  bases  with  general  world  knowl¬ 
edge.  Our  approach  is  somewhat  simpler.  We  are  only  dealing  with  a  subset  of  natural  language 
being  broad  enough  to  allow  a  natural  description  of  the  media  data  but  easier  to  understand  than 
the  full,  general  language.  Furthermore,  we  found  that  for  most  applications  the  knowledge  base  is 
domain  specific  allowing  us  to  deal  with  a  much  smaller  one  for  each  domain.  Both  aspects  con¬ 
tribute  to  an  acceptable  performance  which  is  critical  for  a  database  system. 

In  the  domain  of  IR  there  had  been  early  interest  in  using  AI  techniques  [SPAR78,  SMIT80]. 
The  IRUS  system  [BATES83]  is  more  representative  of  modem  attempts  which  is  designed  for 
processing  heterogeneous  data  bases  through  natural  language  queries.  The  RUBRIC  system 
[TONG87]  is  a  production  rule-based  IR  system  in  which  the  indexing  base  of  the  system  contains 
positional  information  about  words  in  the  texts,  which  allow  positional  controls  on  words  while 
processing  queries.  The  I3R  system  [CROF87]  provides  assistance  to  users  at  all  stages  of  the  re¬ 
trieval  process  and  consists  of  a  set  of  expert  systems  managed  by  a  scheduler.  Last  but  not  least, 
the  IOTA  system  [CHIA87]  tries  to  improve  the  qualitative  performance  of  IR  systems  in  replacing 
keywords  by  noun  groups  involving  extensive  semantics. 
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The  approach  we  propose  is  somewhat  different  from  the  intelligent  IR  systems  mentioned.  It 
is  clear  that  most  of  the  work  in  these  systems  is  mainly  concerned  with  natural  language  process¬ 
ing,  particularly  query  processing,  and  deductive  capabilities  based  on  extended  semantic  model 
of  document  content  and  sometimes  from  the  user.  Our  approach  also  shares  these  characteristics. 
However,  the  concept  of  matching  function  between  system  concepts  and  user  concepts  is  based 
on  exact  matching  in  many  systems  while  our  approach  is  based  on  approximate  matching.  Even 
in  systems  with  approximate  matching  capabilities,  the  matching  function  used  are  primitive  or  su¬ 
perficial  at  best  compared  to  our  approach  which  integrates  object-oriented  technology  to  natural 
language  understanding  to  improve  the  quality  of  the  matching  process. 

3.  Architecture  of  Multimedia  Database  Management  System  (MDBMS) 

In  this  section,  we  outline  the  architecture  of  the  MDBMS.  The  architecture  consists  of  the  var¬ 
ious  components  of  the  MDBMS  system.  Before  we  continue,  definitions  and  various  issues  asso¬ 
ciated  with  the  data  model  used  in  the  MDBMS  system  are  addressed. 

3.1  Definitions  and  Background 

As  mentioned  before,  multimedia  data,  in  the  broadest  sense,  consists  of  unformatted  data  such 
as  text,  image,  voice,  signals,  etc.  in  addition  to  alphanumeric  data.  We  define  a  multimedia  data¬ 
base  management  system  (MDBMS)  as  a  system  that  manages  all  multimedia  data  and  provide 
mechanisms  to  handle  concurrency,  consistency,  and  recovery  in  addition  to  providing  a  query  lan¬ 
guage  and  query  processing. 

Despite  differences  in  data  model  and  implementation  aspects,  all  research  projects  on  MD¬ 
BMS  have  decided  to  organize  multimedia  data  using  abstract  data  type  (ADT)  concept.  This  is 
generally  accepted  as  the  adequate  approach.  However,  none  of  the  projects  have  addressed  the 
problem  of  content  retrieval  of  multimedia  data. 

The  fundamental  difficulty  in  handling  multimedia  data  is  intrinsically  tied  to  a  very  rich  se¬ 
mantics.  To  illustrate  such  a  difficulty,  let  us  look  at  an  image  of  ships.  Given  such  a  picture,  how 
are  we  to  know  what  type  of  ships  are  in  the  picture.  In  other  words,  are  the  ships  destroyers,  cruis¬ 
ers,  submarines  or  passenger  ships?  As  another  example,  let  us  suppose  that  there  is  a  picture  of  a 
dog  and  a  cat.  How  do  we  know  if  they  are  chasing  each  other  or  playing? 

To  answer  queries  posed  on  images,  for  example,  a  person  must  draw  from  a  very  rich  experi¬ 
ence  encountered  in  life  to  derive  at  a  good  answer.  One  must  have  a  sophisticated  technique  to 
analyze  the  contents  of  the  images  to  get  the  semantics  of  different  things  in  the  images.  Technol¬ 
ogy  today  is  not  advanced  enough  to  expect  systems  to  have  this  kind  of  capability  to  answer  mul¬ 
timedia  query.  However,  we  can  use  both  AI  and  IR  technology  to  do  the  next  best  thing.  We  can 
abstract  the  contents  of  multimedia  data  into  words  or  text  and  use  the  text  description  equivalent 
of  the  original  multimedia  data  to  match  the  user  request  or  query.  This  is  the  principle  we  will  use 


in  designing  a  MDBMS  to  handle  multimedia  data  for  different  applications.  Figure  1  shows  the 
format  of  a  multimedia  data  which  consists  of  the  registration,  raw  and  description  data. 

Raw  data  is  the  bit  string  representation  of  the  image,  sound,  signal,  etc.  obtained  from  scan¬ 
ning  or  digitizing  the  original  multimedia  data.  Registration  data  generally  enhances  the  informa¬ 
tion  about  raw  data  and  is  not  redundant.  The  contents  of  a  multimedia  data  is  described  by 
description  data.  Description  data  cannot  be  automatically  derived  by  the  computer  given  the  tech¬ 
nology  today.  We  assume  that  users  will  supply  the  description  data  for  multimedia  data  in  a  natural 
language  form. 


3.2  Architecture 

In  this  section,  we  present  the  various  components  of  our  MDBMS.  This  is  the  modified  ver¬ 
sion  of  the  architecture  of  a  MDBMS  discussed  in  [LUM89].  Our  proposed  architecture  enhances 
the  performance  of  the  matcher  component  and  adds  the  capabilities  of  the  user  interface  which  are 
lacking  in  the  architecture  proposed  in  [LUM89]. 

As  shown  in  Figure  2,  the  components  break  down  into  user  interface,  query  processor,  data 
access  and  intelligent  retrieval  subsystem.  The  data  access  subsystem  consists  of  conventional  and 
media  manager  and  controls  the  access  to  the  actual  data  stored  in  relational  and  media  DBMS.  The 
intelligent  retrieval  subsystem  is  composed  of  parser,  generator,  matcher  and  description  manager. 
The  query  processor  accepts  queries  from  users  and  executes  them  by  calling  the  other  compo¬ 
nents.  When  a  new  description  for  a  multimedia  data  is  entered,  for  example,  the  query  processor 
calls  the  parser.  The  parser  uses  the  dictionary  to  produce  first-order  predicates  and  return  them  to 
the  query  processor.  The  query  processor  then  hands  the  predicates  over  to  the  description  manager 
which  then  links  the  description  to  its  multimedia  data. 

When  the  query  processor  receives  a  query  the  first  task  is  to  decompose  the  query  into  sub¬ 
queries  affecting  only  conventional  or  media  part.  The  conventional  subquery  is  passed  direcdy  to 
the  conventional  data  manager  without  modifications.  For  the  text  description,  the  query  processor 
calls  the  natural  language  parser  to  obtain  the  equivalent  query  predicates.  The  predicates  are  then 
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Data  Access  Subsystem  Intelligent  Retrieval  Subsystem 
Figure  2:  Architecture  of  MDBMS  System 


handed  to  the  matcher.  The  matcher  tries  to  match  the  query  with  the  qualified  multimedia  data  by 
comparing  the  predicates  of  the  query  with  that  of  the  stored  multimedia  data.  The  matcher  does 
this  by  calling  the  description  manager  and  using  domain  knowledge.  In  addition,  if  an  exact  match 
is  not  possible,  the  matcher  automatically  switches  to  approximate  match.  To  guide  the  matching 
process,  the  matcher  also  gets  input  from  the  user.  As  the  solution  to  the  natural  language  part  of  a 
query,  the  query  processor  receives  links  to  the  qualified  multimedia  data.  After  combing  them  with 
the  results  of  the  conventional  subquery  the  final  results  are  retrieved  by  the  Data  Access  Sub¬ 
system. 

The  query  processor,  conventional  and  media  object  manager,  description  manager,  parser  and 
matcher  have  already  been  implemented  as  part  of  the  MDBMS  prototype  system  developed  at  the 
Naval  Postgraduate  School  [MEYE88,  LUM89,  HOLT90,  PEI90].  In  this  paper,  we  describe  main 
components  of  the  system:  the  natural  language  understanding  capabilities  of  the  parser,  the  pro¬ 
posed  approximate  matching  process  in  the  matcher  and  the  interaction  technique  of  the  user  inter¬ 
face. 

4.  Natural  Language  Understanding  in  the  Parser 

In  this  chapter  we  describe  the  natural  language  understanding  capabilities  of  the  parser.  We 
outline  that  in  order  to  accomplish  the  goal  of  content  retrieval  of  multimedia  data  full,  understand¬ 
ing  of  natural  language  is  not  necessary.  However,  a  restricted  interpretation  is  necessary  which  is 
done  by  the  parser  component  using  the  application  dependent  dictionary  as  a  semantic  basis. 
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4.1  Natural  Language  Description  for  Multimedia  Data 

As  mentioned,  we  propose  to  perform  retrieval  of  multimedia  data  by  matching  the  natural  lan¬ 
guage  descriptions  with  the  query  specifications.  We  discarded  the  keyword  search  technique  as  a 
viable  option  because  keywords  are  discrete  and  lack  complex  linking  mechanisms  to  adequately 
capture  the  contents  of  multimedia  data.  In  addition,  it  is  not  always  possible  to  convey  exact  mean¬ 
ings  using  only  keywords.  In  contrast,  natural  language  descriptions  allow  the  description  of  all 
kind  of  multimedia  data  with  the  additional  advantage  that  everyone  is  familiar  with  it  resulting  in 
high  acceptance  rate. 

We  believe  that  unrestricted  natural  language  processing  is  very  difficult  to  achieve  given  the  AI 
technology  today.  We  found  that  the  language  needed  to  describe  multimedia  data  is  much  more  for¬ 
mal  than  everyday  English.  Hence,  instead  of  natural  language  description,  we  use  captions  to  de¬ 
scribe  multimedia  data.  Captions  are  a  natural  but  special,  stylized  way  of  writing  descriptions  with 
a  subset  of  natural  language  and  not  as  difficult  to  parse  and  interpret  as  general  natural  language. 

Additionally,  for  a  particular  multimedia  application  the  universe  of  discourse  is  usually  quite 
constraint.  Nouns  tend  to  be  concrete  and  most  multimedia  databases  emphasize  still  photographs 
and  other  fixed  time  graphics  to  which  few  verbs  can  be  applied  thereby  easing  a  difficult  aspect  of 
natural  language  processing.  Important  is  that  we  use  natural  language  only  to  access  entities  in  a 
database  making  complete  understanding  of  all  aspects  of  a  word  unnecessary.  The  details  of  cap¬ 
tions  and  their  restrictions  for  our  objectives  are  beyond  the  scope  of  this  paper  and  are  given  in 
[HOLT90,  ROWE91]. 

4.2  Dictionary 

Besides  the  captions  themselves,  our  system  requires  auxiliary  information  from  a  dictionary. 
The  dictionary  or  lexicon  is  necessary  for  parsing  and  gives  each  possible  natural  language  word 
its  semantic:  its  part  of  speech,  its  grammatical  form  and  the  form  of  literals  needed  to  represent  it. 
Many  of  the  words  -  for  example,  conjunctions  and  qualifying  adjectives  -  are  consistent  in  mean¬ 
ing  across  a  wide  range  of  domains;  thus  we  can  borrow  their  interpretation  from  existing  natural 
language  systems  and  include  them  ir  every  dictionary.  The  words  that  significantly  change  be¬ 
tween  applications  are  nouns  and  few  verbs,  have  need  to  be  defined  for  every  application  domain 
separately,  but  mostly  their  meaning  is  straightforward.  To  simplify  matching,  we  are  trying  to  limit 
the  properties  and  relationships  to  a  small  set  of  primitives,  for  example  we  will  not  distinguish 
between  the  relationship  asserted  by  the  terms  ‘within’,  ‘inside’,  ‘part  of’,  ‘containing’  and  ‘com¬ 
prising’.  This  can  be  done  without  loss  because  in  order  to  achieve  efficient  retrieval  it  is  not  nec¬ 
essary  to  capture  the  full  meaning  of  an  English  expression,  but  just  the  main  intent. 

The  dictionary  is  an  important  part  of  the  system  which  is  application  dependent.  In  order  to 
allow  an  interpretation  of  natural  language  captions  it  defines  the  domain  of  each  application  thus 
restricting  their  vocabulary,  the  semantics  and  the  knowledge  of  the  system  to  apply  all  the  infor¬ 
mation. 
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4.3  Natural  Language  Interpretation 

The  parser  translates  the  text  description  into  a  set  of  predicates  called  merging  list.  The  im¬ 
precision  and  ambiguity  of  the  natural  language  descriptions  is  reduced  considerably  by  transform¬ 
ing  them  into  a  set  of  predicates.  These  predicates  state  facts  about  the  real  world  entities  involved 
with  multimedia  data  like  their  properties  and  relationships.  As  in  most  parsing  methods,  we  chose 
the  use  of  first-order  predicate  calculus  as  a  formal  representation  of  the  description  data.  The  pars¬ 
er  depends  on  the  dictionary  to  turn  the  descriptions  into  predicates.  It  is  the  parser’s  task  to  use  the 
dictionary  to  resolve  synonyms  and  to  check  the  syntactic  context  to  resolve  lexical  ambiguities. 

Our  parser  also  provides  mechanisms  to  automatically  partition  a  user  query  into  the  subject, 
verb  and  object  components.  This  is  essential  in  that,  during  data  retrieval  as  we  will  see  later,  we 
can  use  the  partitioned  components  to  match  against  domain-dependent  knowledge  which  also 
break  down  into  subject,  verb  and  object  categories.  Other  important  features  o.  the  parser  are  the 
use  of  supercaptions,  a  generalization  of  captions,  and  frames  for  stereotypical  actions,  allowing  a 
set  of  predicates  to  be  derived  from  terms  in  the  description. 

Our  current  implementation  of  the  parser  uses  augmented-transition  network  parsing  and  inter¬ 
pretation  routines.  It  is  implemented  in  Quintus  Prolog  and  running  on  a  SUN  SPARC  workstation. 
The  details  of  the  parser  and  the  predicates  are  beyond  the  scope  of  this  paper  and  are  given  in 
[LUM89,  HOLT90,  DULL90,  ROWE91]. 

An  example  of  natural  language  description  and  its  translation  into  an  equivalent  set  of  predi¬ 
cates  using  the  parser  is  shown  below  as  follows: 

Description:  “A  car  with  red  boay ” 

Predicates:  car(x),  component^, y),  body(y),  color(y,red) 

Choosing  the  right  set  of  predicates  is  a  very  difficult  task  which  is  comparable  to  knowledge 
acquisition  for  expert  systems.  For  the  purposes  of  this  paper,  it  is  sufficient  to  assume  that  the  dic¬ 
tionary  lists  all  the  words  the  parser  can  recognize,  all  the  parts  of  speech  associated  with  any  word, 
and  the  predicates  to  use  when  a  word  appears  in  the  description.  Thus,  the  set  of  all  predicates  that 
can  be  used  in  the  descriptions  must  be  defined  in  the  dictionary. 

5.  Matching 

In  this  chapter,  we  propose  new  ways  of  matching  natural  language  descriptions  of  the  multi 
media  data  with  the  query  specifications.  The  key  to  our  matching  process  is  the  use  of  the  domain 
knowledge  represented  using  the  notion  of  class  hierarchy  borrowed  from  the  object-oriented  field. 
Before  we  continue,  we  first  discuss  some  specific  problems  found  in  our  current  matching  capa¬ 
bility  that  we  eluded  earlier.  This  will  serve  as  the  motivation  behind  our  new  intelligent  approach 
to  approximate  matching. 
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5.1  Problems  in  Matching 

In  our  current  system  [LUM89,  HOLT90],  the  result  of  parsing  is  one  set  of  predicates  per  mul¬ 
timedia  data  instance.  A  query  description  is  also  entered  in  natural  language  and  parsed.  The  ar¬ 
guments  of  the  query  predicates  can  be  variables.  A  multimedia  data  is  selected  as  the  result  of  the 
query,  if  there  exists  a  binding  of  query  predicates  to  description  predicates  of  multimedia  data.  The 
match  of  user  query  to  multimedia  data  need  not  be  exact.  A  set  of  rules,  sometimes  domain  de¬ 
pendent,  specifies  situations  in  which  sets  of  predicates  that  look  different  are  really  the  same  thing. 

The  matching  catches  different  natural  language  phrases  with  the  same  meaning,  but  not  the 
semantic  relationships  among  the  predicates.  For  example,  let  us  reconsider  the  description,  “a  car 
with  red  body”,  of  an  image  multimedia  data.  The  predicates  generated  are  “car(x),  compo- 
nent(x,y),  body(y),  color(y,red)" .  For  the  sake  of  argument,  we  consider  a  query  with  the  descrip¬ 
tion,  “a  red  car” .  The  query  would  be  translated  into  something  like  “car(x),  color(x,red)” .  There 
would  be  no  match  because  the  system  does  not  know  that  the  color  of  a  car’s  body  is  identical  to 
the  color  of  the  car.  To  overcome  this  problem,  rules  can  be  introduced  to  express  the  semantic  re¬ 
lationships  among  the  predicates.  In  the  above  case,  the  rule  introduced  could  be: 

if  (car(X),  component(X,Y),  body(Y),  color(Y,Z))  then  color (X,Z)\ 

Using  the  above  rule,  color(x,red)  can  be  deduced  in  the  example  above  and  there  would  be  a 
match  between  the  query  and  the  description.  A  key  unsolved  problem,  however,  is  the  question  of 
which  literals  of  the  predicates  to  generalize  to  get  a  match,  and  how  far  to  generalize.  This  falls 
into  the  category  of  approximate  matching  to  a  user  query  that  we  mentioned  earlier  in  the  paper. 
We  believe  that  the  answer  lies  in  the  use  of  domain-dependent  knowledge. 

If  we  are  just  interested  in  exact  matching  of  a  user  query  to  the  description  of  a  multimedia 
data,  oi  r  current  matching  technique  [LUM89,  HOLT90]  would  be  quite  adequate.  However,  a 
common  problem  lies  in  the  fact  that  the  user  query  is  likely  to  result  in  an  empty  answer  in  which 
no  exact  matching  to  the  description  of  stored  multimedia  data  occurs.  In  this  case,  an  efficient  sys¬ 
tem  will  try  to  perform  approximate  matching  whereby  descriptions  of  multimedia  data  that  satisfy 
some  generalization  of  the  user  query  are  selected.  Our  objective,  then,  is  to  perform  approximate 
matching  to  a  user  query  efficiently.  As  mentioned  earlier,  our  proposed  approximate  matching  al¬ 
gorithm  makes  use  of  domain-dependent  knowledge  to  meet  the  objective. 

5.2  Domain-Dependent  Knowledge 

Earlier,  we  justified  the  use  of  captions  to  describe  multimedia  data  by  stating  that  each  multi- 
media  application  restricts  the  scope  of  the  description  of  multimedia  data.  This  means  that  the  do¬ 
main  of  discourse  for  the  captions  are  limited  for  each  multimedia  application.  Domain -dependent 
knowledge  are  key  concepts  in  the  domain  of  discourse  of  the  captions.  For  our  purposes,  we  only 
include  concepts  of  nouns  and  verbs  in  the  domain-dependent  knowledge. 

To  represent  domain-dependent  knowledge,  we  chose  the  object-oriented  data  model 
[BANE87,  KIM89,  ZDON90].  The  object-oriented  model  supports  highly  structured,  complex  ob- 
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Figure  3:  Generalization  Hierarchy  of  a  Plane 


jects  and  can  capture  naturally  any  mini-world  entity.  The  data  model  has  been  used  widely  in  such 
areas  as  CAD/CAM,  VLSI,  office  automation,  software  engineering  and  AI.  Our  justification  for 
using  the  object-oriented  model  to  represent  domain-dependent  knowledge  is  as  follows:  First,  it 
supports  generalization  and  specialization  abstraction  which  permits  conceptual  generalization  on 
the  contents  of  the  captions.  Second,  researchers  [WOEL87,  HOLT90]  have  identified  the  use  of 
object-oriented  model  in  multimedia  database  applications  as  an  appropriate  and  viable  option. 

Without  loss  of  generality,  we  will  restrict  our  domain  to  the  domain  of  the  military  history  of 
US  forces  in  the  Pacific  during  World  War  2.  The  main  reason  is  that  we  tested  our  current  proto¬ 
type  MDBMS  in  military  application  based  on  the  domain  of  the  US  military  history.  For  our  pur¬ 
poses,  we  will  apply  our  approximate  matching  technique  to  the  domain  of  military  history. 
However,  we  claim  that  our  approximate  matching  technique  can  be  applied  to  other  multimedia 
applications. 

Figure  3  shows  an  example  of  the  generalization  hierarchy  of  a  plane,  a  noun  concept  in  our 
domain  of  discourse.  It  is  the  domain-dependent  knowledge  on  planes  that  participated  in  the  Pa¬ 
cific  during  World  War  2.  We  assume  that  the  reader  is  familiar  with  object-oriented  concepts  such 
as  object,  class,  inheritance  along  class  hierarchy  or  lattice  and  methods.  We  also  assume  that  the 
direction  of  the  arrow  in  Figure  3  is  from  a  class  to  its  subclass.  In  Figure  3,  the  Plane  class  is  spe¬ 
cialized  into  classes  Transport,  Fighter,  Bomber  and  Seaplane.  Class  Transport  is  specialized  into 
class  C-47  and  class  Fighter  is  specialized  into  classes  F6F-Hellcat,  Corsair  and  Zero.  In  addition, 
class  Bomber  is  specialized  into  class  B-25  and  class  Divebomber  which  is  further  specialized  into 
classes  Zero,  Dauntless  and  Stuka. 

The  generalization  hierarchy  of  a  plane  is  a  class  lattice  since  class  Zero  has  two  superclasses, 
namely  class  Fighter  and  class  Divebomber.  In  addition,  properties  of  superclasses  are  inherited  by 
all  their  subclasses  along  the  superclass/subclass  hierarchy  but  not  vice  versa. 
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Figure  3  is  one  example  of  a  domain-dependent  knowledge  corresponding  to  a  noun  (i.e.  plane) 
concept  in  the  domain  of  discourse.  For  our  purposes,  we  can  have  domain-dependent  knowledge 
for  all  noun  and  verb  concepts  in  our  domain  of  discourse.  It  is  obvious  that  some  of  the  noun  and 
verb  concepts  may  belong  to  the  same  class  or  generalization  hierarchy.  Hence,  generalization  hi¬ 
erarchy  need  not  be  created  for  each  and  every  noun  or  verb  concept. 

5.3  Partial  Matching  Algorithm 

In  this  section,  we  will  discuss  our  partial  matching  algorithm.  For  clarity,  we  will  devise  our 
partial  matching  algorithm  by  following  through  an  example.  Unless  explicitly  stated,  we  will  refer 
to  the  example  generalization  hierarchy  given  in  Figure  3.  Before  we  go  on,  we  next  discuss  what 
it  is  that  we  are  interested  in  doing. 

Suppose  that  we  have  images  of  planes  stored  in  the  multimedia  database  and  the  images  are 
described  as  transport  planes.  Let  us  now  assume  that  a  user  gives  a  query  asking  for  all  planes 
which  are  C-47s.  Even  though  there  are  no  exact  matching,  we  should  retrieve  all  transport  planes 
stored  because  any  C-47  is  a  transport  plane  according  to  the  domain-dependent  knowledge.  Now, 
if  the  user  asks  for  all  fighter  planes,  we  cannot  simply  retrieve  all  transport  planes  because  they 
may  not  be  what  the  user  wants.  However,  a  user  asking  for  planes  would  more  likely  retrieve  the 
stored  transport  planes  than  if  he  was  to  ask  for  fighter  planes  because  a  transport  plane  is  still  a 
plane  but  is  not  a  fighter  plane. 

The  goal  of  our  algorithm  is  also  to  minimize  the  influence  of  the  definition  of  the  hierarchy 
which  is  dependent  on  the  designer.  The  generalization  hierarchy  designer  might  have  a  view  of 
the  domain  dependent  knowledge  which  may  not  be  consistent  with  the  view  of  other  people.  This 
phenomenon  might  bias  some  specific  branch  of  the  generalization  hierarchy  over  other  branches 
during  partial  matching. 

An  efficient  partial  matching  algorithm  has  to  deal  with  all  the  problems  such  as  the  ones  ad¬ 
dressed  above  and  come  up  with  a  general  solution.  We  solve  these  problems  by  using  heuristics 
to  assign  a  weight  ranking  system  given  a  generalization  hierarchy(ies).  Our  major  objective  is  to 
come  up  with  a  weight  ranking  scheme  that  is  both  fair  and  accurate  which  can  be  used  to  deter¬ 
mine  whether  stored  multimedia  data  should  be  retrieved  given  a  user  description. 

5.3.1  Weight  Ranking  Scheme  within  a  Generalization  Hierarchy 

In  this  section,  we  will  discuss  the  weight  ranking  strategy  used  by  our  partial  matching  algo¬ 
rithm  given  a  single  generalization  hierarchy.  The  weight  ranking  strategy  used  for  a  group  of  gen¬ 
eralization  hierarchies  will  be  discussed  in  the  subsequent  section.  The  weight  ranking  strategy 
used  on  a  generalization  hierarchy  is  a  consequence  of  the  semantics  of  the  class  hierarchy  (lattice) 
or  the  IS-A  hierarchy  concept  supported  in  an  object-oriented  data  model. 

Given  a  class  C  in  a  generalization  (class)  hierarchy  for  a  noun  or  a  verb  concept,  and  assuming 
that  a  class,  other  than  C,  with  a  rank  of  positive  weight  is  a  specialization  of  class  C  while  one 
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with  a  rank  of  negative  weight  is  a  generalization  of  C,  we  can  introduce  the  following  two  general 
heuristics. 

Heuristic  1:  All  direct  (indirect)  subclasses  of  C  have  positive  weights. 

Heuristic  2:  All  direct  ( indirect )  superclasses  of  C  have  negative  weights. 

Heuristic  1  says  that  given  a  class  C  specified  in  a  user  query,  all  subclasses  of  C  in  the  class 
hierarchy  to  which  C  belongs  are  specializations  of  the  class  and  more  weights  (positive)  are  given. 
Heuristics  2  says  that  given  a  class  C  specified  in  a  user  query,  all  superclasses  of  C  in  the  class 
hierarchy  to  which  C  belongs  are  generalization  of  the  class  and  less  weights  (negative)  are  given. 
This  reasoning  follows  directly  from  the  definition  of  a  class  (IS-A)  hierarchy  and  relationships 
among  classes  along  the  class  hierarchy  in  the  context  of  an  object-oriented  data  model. 

The  assignment  of  negative  weights  to  generalization  is  intuitively  clear.  The  assignment  of 
positive  weights  to  specialization  is  based  on  the  fact  that  specialization  inherits  all  properties  of 
the  parent  nodes  in  addition  to  having  its  own  additional  information.  Hence,  we  feel  that  positive 
or  more  weights  should  be  assigned  to  the  nodes  in  the  paths  towards  specialization  hierarchy. 

Given  the  heuristics,  it  is  easy  to  see  that  all  classes  in  the  class  hierarchy  which  have  ranks  of 
positive  weights  relative  to  the  class  C,  which  is  specified  in  the  user  query  as  either  a  noun  or  a 
verb  concept,  are  selected  during  approximate  matching.  This  is  because  all  classes  with  ranks  of 
positive  weights  are  subclasses  (specialized  classes)  of  the  class  C,  specified  by  the  user  query. 
Since  each  of  the  classes  is  a  specialized  version  of  class  C,  it  encompasses  properties  of  class  C 
and  indeed  is  class  C. 

On  the  other  hand,  all  classes  in  the  class  hierarchy  which  have  ranks  of  negative  weights  rel¬ 
ative  to  the  class  C  which  is  specified  by  the  user  query  should  be  restrictively  selected  depending 
on  the  weights.  This  is  because  all  classes  with  ranks  of  negative  weights  are  superclasses  (gener¬ 
alized  classes)  of  the  specified  class  C  along  the  class  hierarchy.  Since  each  of  the  classes  is  a  gen¬ 
eralized  version  of  class  C,  it  does  not  encompass  all  properties  of  class  C  and  is  not  class  C.  The 
question  of  v.  Inch  classes  to  select  depends  on  getting  information  from  the  user  on  how  far  to  gen¬ 
eralize. 

The  weight  ranking  system  we  introduced  so  far  is  vague  and  is  not  well  defined.  What  is  de¬ 
fined  is  that  given  a  class  C  in  a  class  hierarchy  of  interest,  any  class  belonging  to  the  same  class 
hierarchy  which  is  assigned  a  positive  weight  is  always  selected  during  approximate  matching.  On 
the  other  hand,  a  class  in  the  same  class  hierarchy  which  is  assigned  a  negative  weight  is  only  se¬ 
lected  during  approximate  matching  if  it  exceeds  a  threshold  given  by  the  user.  We  now  discuss  the 
assignment  of  weights  for  different  classes  in  the  class  hierarchy  of  interest. 

There  are  three  different  situations  in  which  weights  can  be  assigned  to  classes  in  a  class  hier¬ 
archy.  The  different  situations  are  shown  in  Figure  4.  Suppose  that  the  class  specified  in  a  user  que¬ 
ry  is  class  C.  As  before,  we  assume  that  the  direction  of  the  arrow  is  from  a  class  to  its  subclass. 
For  example,  in  Figure4  (a),  class  C  is  a  superclass  of  class  X  and  class  X  is  a  subclass  of  class  C. 
The  first  situation,  shown  in  Figure  4(a),  is  to  assign  weight  to  a  class  (X  or  Y)  which  is  a  subclass 
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Figure  4:  Three  Situations  in  a  Class  Hierarchy 

of  class  C.  The  second  situation,  shown  in  Figure  4(b),  is  to  assign  weight  to  a  class  (X  or  Y)  which 
is  a  superclass  of  class  C.  The  third  situation,  shown  in  Figure  4(c),  is  to  assign  weight  to  a  class 
(Y)  which  is  a  subclass  of  a  superclass  (i.e.  X)  of  class  C. 

The  principles  behind  our  weight  ranking  system  are  quite  simple.  We  assume  that  all  classes 
with  positive  weights  and  some  classes  with  negative  weights  that  exceed  a  threshold  value  are  se¬ 
lected  during  approximate  matching.  First,  we  assign  a  weight  of  0  to  the  class  C  specified  in  the 
user  query.  Class  C  is  the  reference  point  to  all  other  classes  in  the  class  hierarchy  during  approx¬ 
imate  matching.  For  classes  which  are  subclasses  of  class  C,  we  assign  positive  weights  because 
they  are  specialized  version  of  class  C.  Specialized  versions  of  class  C  have  more  specific  and  def¬ 
inite  information  than  C  itself  and  hence  are  assigned  positive  weights  instead  of  0.  For  our  pur¬ 
poses,  all  subclasses  of  C  are  assigned  the  same  positive  weight. 

For  classes  which  are  superclasses  or  subclasses  of  superclasses  of  class  C,  we  assign  negative 
weights  because  they  are  generalized  version  of  class  C.  Generalized  versions  of  class  C  have  less 
and  more  general  information  than  C  itself  and  hence  are  assigned  negative  weights.  Different  gen¬ 
eralization  versions  have  different  negative  weights.  However,  in  assigning  negative  weights,  we 
have  to  minimize  the  influence  of  the  definition  of  the  model.  It  is  true  that  the  further  away  a  class 
is  from  class  C  in  the  class  hierarchy,  the  more  negative  weight  is  assigned  to  the  class. 

In  most  systems,  the  assignment  of  weight  of  a  class  is  linearly  inverse  proportional  to  the  depth 
level  of  the  class  relative  to  the  level  of  the  class  C  specified  in  the  user  query.  We  believe  that  this 
is  not  the  correct  approach  because  the  relative  distance  of  a  particular  class  to  the  class  of  interest, 
in  this  case  class  C,  with  respect  to  other  classes  is  not  the  absolute  but  some  artificial  distance 
caused  by  a  particular  designer’s  view  of  the  domain  knowledge.  The  main  problem  with  this  ap¬ 
proach  is  that  some  classes  belonging  to  some  lengthy  branch  could  be  unfairly  disqualified  be¬ 
cause  of  higher  negative  weights.  Our  weight  ranking  system  tries  to  minimize  the  bias  against 
some  lengthy  branch  of  a  class  hierarchy  over  other  shorter  branches. 

Given  that  class  C  is  the  class  specified  by  user  query  as  shown  in  Figure  4,  the  assignment  for¬ 
mulas  of  weights  for  classes  in  a  class  hierarchy  according  to  the  three  different  situations  men¬ 
tioned  are  as  follows. 

(1)  Class  specified  by  user  query  (i.e.  class  C  in  Figure  4) 

weight  =  0 
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(2)  Subclass  of  class  C  (i.e.  class  X  or  Y  in  Figure  4(a)) 
weight  =  a  ,  where  a  is  a  integer  constant 


(3)  Superclass  of  class  C  (i.e.  class  X  or  Y  in  Figure  4(b)) 


weight  =  - 


,  where  a,  (3  are  integer  constants  and  n  is  level  #  of  super¬ 
class  relative  to  class  C 


(4)  Subclass  of  a  superclass  of  class  C  (i.e.  class  Y  in  Figure  4(c)) 


/  *  i  1  \  (  i  i  (f + 1  -J)  \ 

weight  = -  ax  £(i)  -  7x£(i) 

'  <  =  i  '  '  j  =  i  ' 

where  a,P,u,y  are  integer  constants;  h  is  level  #  of  superclass  relative  to  class  C  and  /  is 
level  #  of  subclass  relative  to  superclass 

In  our  scheme,  a  class  which  is  assigned  a  positive  weight  is  always  selected  during  partial 
matching.  A  class  with  a  negative  weight  can  be  selected  provided  that  it  does  not  exceed  a  thresh¬ 
old  value  set  by  the  user.  To  understand  the  weight  assignments  for  different  classes,  we  next  give 
some  examples  using  the  class  hierarchy  of  Figure  3.  Given  a  user  query,  if  the  image  correspond¬ 
ing  to  the  user  description  is  not  found  in  the  database,  the  system  then  automatically  proceeds  with 
approximate  matching.  Using  the  weight  assignment  formulas  and  given  some  user  query  descrip¬ 
tions,  the  weights  for  some  of  the  classes  in  the  class  hierarchy  are  as  follows.  For  the  sake  of  ar¬ 
gument,  we  assume  that  the  values  of  a,  p,  y  and  v  are  40,  2,  48  and  2  respectively. 

(1)  “A  transport  plane  sank  in  the  Pacific” 

Transport  =  0,  C-47  =  40,  Plane  =  -20,  Fighter  =  -44,  Corsair  =  -56 

(2)  “A  F6F-Hellcat  sank  in  the  Pacific” 

F6F-Hellcat  =  0,  Plane  =  -30,  Seaplane  =  -54,  Stuka  =  -72,  C-47  =  -66 

(3)  “A  bomber  sank  in  the  Pacific” 

Bomber  =  0,  Stuka  =  40,  B-52  =  40,  Plane  =  -20,  Seaplane  =  -44,  C-47  =  -56 

In  the  examples  shown,  all  classes  which  are  assigned  positive  weights  are  selected  during  par¬ 
tial  matching.  In  example  ( 1 ),  the  class  C-47  has  a  positive  weight  of  40.  This  means  that  the  image 
whose  description  is  “A  C-47  sank  in  the  Pacific”  is  selected  during  partial  matching.  As  shown 
in  the  examples,  all  classes  which  are  subclasses  of  the  class  which  is  specified  in  a  user  query  are 
assigned  positive  weights.  All  classes  which  are  superclasses  or  subclasses  of  the  superclasses  of 
the  class  which  is  specified  in  a  user  query  are  assigned  negative  weights.  For  these  classes,  the 
weight  of  a  class  is  inversely  proportional  to  the  depth  level  of  the  class  relative  to  the  level  of  the 
class  specified  in  the  user  query  along  the  class  hierarchy  although  they  are  not  strictly  linear.  In 
example  (2),  the  class  Seaplane  has  a  negative  weight  of  -54.  This  means  that  the  image  whose  de¬ 
scription  is  “A  Seaplane  sank  in  the  Pacific”  has  a  weight  of  -  54.  Class  Stuka  has  a  negative 
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weight  of  -72  and  class  Stuka  is  further  away  from  F6F-Hellcat  than  class  Seaplane  is  from  F6F- 
Hellcat. 

Suppose  the  weight  of  a  class  is  linearly  but  inversely  proportional  to  the  depth  level  of  the 
class  relative  to  the  level  of  the  class  C  specified  in  the  user  query.  If  we  assign  a  negative  constant 
weight,  say  -10,  for  each  level  away  from  class  C,  the  class  which  is  5  levels  away  from  class  C 
will  have  a  negative  weight  of  -50  compared  to  a  negative  value  of  -20  for  a  class  which  is  2  levels 
away  from  class  C.  For  example,  if  the  user  query  is  “A  transport  sank  in  the  Pacific”,  the  weight 
of  class  Transport  is  0,  class  Seaplane  is  -20  and  class  Stuka  is  -40.  Using  our  formulas,  the  same 
user  query  will  assign  weights  of  classes  Transport,  Seaplane  and  Stuka  to  be  0,  -44  and  -62  respec¬ 
tively.  The  weight  of  class  Stuka  is  more  biased  against  relative  to  the  weight  of  class  Seaplane  us¬ 
ing  the  linear  method  over  our  method. 

It  is  very  difficult  to  quantify  how  much  closer  class  Seaplane  is  to  class  Transport  over  class 
Stuka  to  class  Transport  as  both  Seaplane  and  Stuka  are  types  of  planes.  Our  formulas  are  designed 
to  minimize  bias  as  best  as  possible.  A  user  is  more  likely  select  a  threshold  value  such  that  class 
Stuka  is  less  likely  selected  during  approximate  matching  over  class  Seaplane  using  the  linear 
method  compared  to  using  our  dynamic  method.  Another  difficult  task  is  to  set  the  value  of  the  con¬ 
stants  to  be  applied  in  our  assignment  formulas  as  well  as  the  threshold  value.  The  user  must  choose 
the  correct  values  for  the  constants  and  the  threshold  value  depending  on  the  number  of  objects  that 
qualify  during  approximate  matching.  Hence,  it  is  necessary  for  the  system  to  interact  with  the  user 
through  the  user  interface  throughout  the  matching  process. 

5,3.2  Weight  Ranking  Scheme  for  a  Group  of  Generalization  Hierarchies 

In  the  previous  section,  we  discussed  the  ranking  of  weights  for  classes  belonging  to  the  same 
generalization  hierarchy.  In  this  section,  we  extend  the  ranking  of  weights  for  classes  belonging  to 
different  generalization  hierarchies.  In  our  scheme,  the  ranking  of  weights  for  classes  in  different 
class  hierarchies  is  influenced  by  the  following  rules. 

Rule  1:  For  each  local  class  hierarchy,  the  weight  ranking  system  discussed  in  Section  4.3.1 
is  applied. 

Rule  2:  For  different  class  hierarchies,  the  user  determines  the  priority  order  of  class  hierar¬ 
chies. 

Using  rule  1,  for  each  class  hierarchy  selected  by  the  user  query,  the  classes  within  the  class 
hierarchy  are  assigned  weights  using  the  weight  ranking  system  discussed  in  Section  4.3.1  and  is 
a  straightforward  process.  Hence,  regardless  of  the  number  of  class  hierarchies  involved,  all  classes 
belonging  to  these  hierarchies  can  be  assigned  weights  for  partial  matching.  It  is  easy  to  see  that 
rule  1  does  not  cause  any  problems  because  there  is  no  interrelationship  between  classes  of  differ¬ 
ent  class  hierarchies  during  weight  assignments. 

The  global  ranking  of  weights  for  different  class  hierarchies  is  a  problem  because  the  weights 
assigned  within  each  class  hierarchy  now  has  to  be  considered  with  respect  to  weights  assigned  for 
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Figure  5:  Combination  of  two  Class  Hierarchies 

other  class  hierarchies  and  they  have  to  be  meaningf  ul  globally.  Rule  2  is  to  determine  the  priority 
order  of  importance  of  the  class  hierarchies  selected  from  a  user  through  the  user  interface.  Differ¬ 
ent  class  hierarchies  can  be  assigned  different  weights  according  to  the  priority  order  of  impor¬ 
tance. 

Figure  5  shows  a  combinations  of  classes  belonging  to  two  different  class  hierarchies.  For  our 
purposes,  we  now  consider  a  user  query  description  “C  and  D”  involving  classes  C  and  D  belong¬ 
ing  to  different  class  hierarchies  in  Figure  5.  There  are  three  generic  partial  matching  combination 
types  and  they  are  given  as  follows.  The  example  classes  in  the  combination  types  are  taken  from 
Figure  5.  CHI  is  the  name  of  the  class  hierarchy  on  the  left  and  CH2  is  the  name  of  the  class  hier¬ 
archy  on  the  right  in  Figure  5. 

Type  1:  C  of  CHI,  and  any  class  in  CH2  except  D. 

Type  2:  Any  class  in  CHI  except  C,  and  D  of  CH2. 

Type  3:  Any  class  in  CHI  except  C,  and  any  class  in  CH2  except  D 

The  ranking  of  weights  for  type  1  and  type  2  combinations  are  easy  to  handle  by  using  the  pre¬ 
viously  discussed  weight  ranking  system.  This  is  because  we  only  need  to  assign  weights  to  classes 
in  one  of  the  class  hierarchies  but  not  both.  However,  handling  type  3  combination  requires  a  closer 
attention  because  it  requires  assigning  weights  to  classes  belonging  to  different  class  hierarchies. 
To  assign  weights  in  this  case,  we  determine  the  priority  order  of  CHI  and  CH2  through  feedback 
from  the  user.  Through  the  user  interface,  we  get  information  on  which  class  hierarchy  has  a  higher 
priority.  We  then  assign  different  weights  for  CH 1  and  CH2  depending  on  the  priority  order. 

This  can  be  expressed  using  the  following  weight  formula.  The  constant  values  of  a  and  (3  has 
to  be  determined  by  the  user  through  the  user  interface. 

(1)  Weight  (Type  1)  =  Weight  (CH2) 

(2)  Weight  (Type  2)  =  Weight  (CHI) 

(3)  Weight  (Type  3)  =  a  (Weight  (CHI))  +  p  (Weight  (CH2)) 

Figure  6  is  a  generalization  (class)  hierarchy  of  the  noun  Place  concept.  Using  Figure  3  and  Fig¬ 
ure  6,  given  a  user  query  description  “A  C-47  sank  in  the  Ocean”,  an  example  of  a  type  1  combi¬ 
nation  is  a  multimedia  data  with  a  description  “A  C-47  landed  in  an  Island”.  A  type  2  combination 
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Figure  6:  Generalization  Hierarchy  of  a  Place 


is  a  multimedia  data  with  a  description  of  “A  Stuka  sank  in  the  Ocean”.  Finally,  a  type  3  combi¬ 
nation  is  a  multimedia  data  with  a  description  “A  Transport  landed  in  Sea”. 

In  this  section,  we  discussed  the  assignment  of  weights  for  classes  involving  two  different  class 
hierarchies.  For  a  practical  system,  the  number  of  class  hierarchies  involved  for  weight  assignment 
is  obviously  large  since  many  noun  and  verb  concepts  are  involved.  It  is  not  difficult  to  see  that  our 
weight  ranking  scheme  discussed  in  this  section  can  be  easily  extended  to  assign  weights  for  class¬ 
es  involving  many  class  hierarchies.  The  main  problem  lies  in  how  good  the  user  interface  is  in 
getting  the  information  from  the  user.  Obviously,  the  weight  ranking  system  has  to  be  dynamic, 
since  all  constant  values  assigned  by  the  user  can  change  depending  on  the  number  of  qualified 
multimedia  data  selected  during  partial  matching.  The  user  also  has  to  determine  the  threshold  val¬ 
ue  such  that  not  too  many  multimedia  data  are  selected  from  the  database. 

5.3.3  Application  of  Weighting  Algorithm 

The  application  of  the  weighting  algorithm  just  presented  requires  a  parser  to  understand  the 
natural  language  specifications  in  the  multimedia  data  descriptions  and  the  user  queries.  As  stated 
earlier,  the  descriptions  are  parsed  and  stored  in  the  system  as  predicates.  The  queries  are  processed 
as  follows. 

When  a  query  is  received  from  the  user,  the  parser  separates  the  natural  language  specification 
into  smaller  component  groups,  namely  subject  noun,  verb  and  object  noun  phrases.  Each  of  these 
will  actually  become  predicates.  When  these  predicates  match  exactly  with  the  predicates  in  the 
descriptions  of  certain  multimedia  data,  those  multimedia  data  will  be  retrieved.  However,  there 
may  be  other  descriptions  of  multimedia  data  that  are  actually  of  interest  to  users  but  those  descrip¬ 
tions  are  not  stated  as  logically  implied  by  the  query.  This  latter  category  is  expected  to  be  the  usual 
case  rather  than  the  former  for  reasons  stated  earlier. 

To  find  the  latter,  we  suggest  that  system  search  in  the  noun  and  verb  generalization  hierarchies  v 

of  the  object  classes  and  assign  weights  to  the  descriptions  as  given  in  the  weight  assignment  algo¬ 
rithm,  assigning  the  appropriate  weighting  factors  (co  and  8  in  the  previous  section)  as  received 
from  the  user.  These  multimedia  data  with  combined  weight  exceeding  the  threshold  value  set  by 
the  user  will  then  be  retrieved. 
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The  separation  of  the  natural  language  query  can  be  in  smaller  components  than  the  three  groups 
just  stated.  For  example,  a  complex  noun  phrase  may  be  separated  into  a  number  of  small  noun 
groups  and  the  weighting  algorithm  applied  to  these  groups  to  obtain  a  combined  weight.  For  ex¬ 
ample,  “ the  man  with  a  mustache ”  can  become  two  classes,  namely  man  and  mustache.  Naturally, 
the  finer  the  granularity  of  the  separation,  the  larger  and  the  more  complex  the  processing  is  needed. 

6.  Graphical  User  Interface 

The  goal  of  a  graphical  user  interface  is  to  support  the  query  specification  process  allowing  the 
user  to  efficiently  use  the  database  system.  It  should  allow  inexperienced  users  to  retrieve  data  from 
the  database  without  having  to  know  a  specific  query  language.  In  today’s  database  management 
systems  the  user  is  forced  to  think  in  terms  of  data  model  and  query  language,  differing  a  lot  from 
his  way  of  thinking.  Often  a  user  can  express  a  query  easily  in  natural  language,  but  has  difficulties 
to  express  it  in  some  given  query  language. 

Most  queries  involve  both  media  and  formatted  data.  For  the  media  part  of  the  query  we  use 
our  intelligent  matching  algorithm  which  is  directly  processing  natural  language  captions.  For  con¬ 
ditions  on  formatted  data,  natural  language  expressions  are  mostly  too  imprecise  to  be  directly  pro¬ 
cessed.  We  try  to  overcome  this  problem  by  providing  a  graphical  user  interface  supporting  a 
natural  query  specification. 

The  data  model  adopted  in  our  system  is  an  extended  relational  data  model.  Despite  some  draw¬ 
backs  the  relational  model  has  great  advantages:  It  is  well  known,  widely  used  and  has  a  firm  the¬ 
oretical  basis.  For  our  purpose,  we  extend  the  relational  model  to  capture  media  data  types  and,  as 
shown  below,  we  also  extend  the  query  language  to  allow  the  manipulation  of  media  data  and  fa¬ 
cilitate  the  query  specification  process. 

Before  describing  the  user  interface  of  the  MDBMS  system,  we  first  outline  ways  to  achieve  a 
natural  query  specification  process. 

6.1  Towards  a  natural  query  specification 

Usually,  every  user  can  describe  a  query  (or  at  least  the  desired  result)  easily  in  natural  lan¬ 
guage.  Unfortunately,  natural  language  expressions  representing  a  query  are  imprecise  and  difficult 
to  automatically  translate  into  a  formal  query  language  to  be  understood  by  a  database  management 
system.  We  argue  that  the  gap  between  the  user’s  way  of  expressing  a  query  in  natural  language 
and  database  manipulation  languages  like  SQL  can  be  improved  considerably. 

When  comparing  the  user’s  natural  language  (NL)  expression  for  a  query  with  corresponding 
SQL  statements  the  first  difficulty  is  that  the  table  and  attribute  names  do  not  exactly  match.  In  a 
graphical  user  interface  this  problem  is  easy  to  overcome.  All  table  and  attribute  names  can  be  pre¬ 
sented  to  the  user  who  simply  selects  the  desired  ones  using  a  pointing  device  (e.g.  mouse). 

Another  difficulty  is  related  to  joins  between  tables.  Mostly  the  join  condition  is  hidden  in  the 
user’s  NL  expression.  In  examining  a  large  number  of  queries  expressed  in  natural  language  as  well 
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as  SQL  we  found  that,  in  most  cases,  the  join  condition  directly  corresponds  to  some  specific  NL 
expressions.  Additionally,  the  number  of  joins  used  in  most  of  the  queries  was  small  compared  to 
the  number  of  possible  joins.  This  can  be  explained  by  two  facts.  First,  the  number  of  semantically 
meaningful  joins  is  restricted  and  second,  some  of  the  most  frequently  used  joins  are  already  in¬ 
tended  at  the  design  time  of  the  database.  In  order  to  provide  a  natural  way  of  expressing  joins,  in 
our  system  we  allow  database  designer  and  user  to  define  and  name  joins  prior  to  its  actual  use.  A 
predefined  join  can  involve  more  than  two  tables  (e.g.  two  tables  are  joined  by  means  of  a  third 
table)  thereby  providing  a  simple  way  of  expressing  m:n  relationships.  Once  defined  and  named, 
all  predefined  joins  can  be  used  to  specify  a  query.  Predefined  joins  differ  from  views:  First,  the 
result  of  a  predefined  join  is  not  a  table  as  in  the  case  of  a  view  but  a  specific  connection  between 
tables.  Second,  predefined  joins  allow  connections  between  different  levels  in  nested  queries  and 
even  recursive  joins  can  be  expressed.  An  example  using  predefined  joins  is  given  in  the  next  sec¬ 
tion. 

Another  thing  we  learned  in  examining  the  process  of  query  specification  is  the  handling  of 
complex  queries.  Given  a  complex  data  retrieval  task  the  user  is  partitioning  it  into  smaller  subtasks 
which  are  easier  to  handle.  Starting  with  the  clear  parts  of  the  query  the  user  deals  with  all  parts 
and  combines  the  results  into  the  final  solution.  In  our  system  we  support  this  way  of  handling  com¬ 
plex  queries  by  an  incremental  query  specification  to  be  described  in  the  next  section. 

Finally,  we  observed  that  a  special  category  of  queries  is  easy  to  express  in  NL  but  rather  com¬ 
plicated  in  a  formal  query  language.  Additional  operators,  closely  related  to  corresponding  NL  ex¬ 
pressions,  allow  an  easier  and  clearer  query  specification.  Considering  for  example  a  query  like 
‘Select  the  name  of  planes  which  can  carry  all  weapons  of  the  category  air-to-air’  we  found  that  a 
special  ‘all’  operator  greatly  enhances  the  readability  and  understandability  of  the  SQL-like  query 
making  it  similar  to  the  user’s  NL  expression.  For  the  example,  we  presume  to  have  the  tables 
plane,  weapon,  plane  jweapon  and  a  predefined  join  named  carries  expressing  the  m:n  relationship 
between  planes  and  weapons. 

select  p_name  from  plane 

where  plane  carries  weapon 

and  w_nr  =  all  (select  w_nr  from  weapon 
where  category  =  ‘air-to-air’) 

A  SQL  statement  expressing  the  same  query  without  the  all  operator  is  rather  complicated.  Two 
possibilities  are: 

select  p_name  from  plane  select  p_name  from  plane 

where  ((select  w_nr  from  plane_weapon  A  where  not  exists 

where  plane. w_nr  =  A.w_nr)  (select  *  from  plane_weapon  B 

contains  where  B.w_nr  in  (select  w_nr  from  weapon 

(select  w_nr  from  weapon  .  where  cate8ory  =  ’air-to-air’)) 

where  category  =  ’air-to-air’) )  an<*  n0*  ex'sts 

(select  *  from  plane_weapon  C 

where  C.  p_nr  =  B.p_nr 

and  C.w_nr  =  B. W_nr) ) 


6.2  Description  of  the  Graphical  User  Interface 

In  this  paper,  we  will  give  a  general  idea  of  our  graphical  user  interface  by  presenting  a  small 
example  of  the  retrieval  process.  Due  to  space  limitations  we  will  only  describe  a  small  fraction  of 
its  capabilities. 

After  selecting  the  database  the  user  gets  the  system  menu  providing  the  main  database  manip¬ 
ulation  functions:  insert,  delete,  update  or  retrieve.  When  selecting  retrieval,  the  user  gets  the  query 
specification  window  and  his  first  step  is  to  select  the  tables  to  be  used  in  the  query.  For  each  se¬ 
lected  table  a  list  with  all  attributes  will  be  displayed  in  a  separate  window  and  all  predefined  con¬ 
nections  involving  at  least  one  of  the  selected  tables  will  appear  in  the  Connections  window.  To 
specify  the  result  list  (projection)  the  user  has  to  move  the  desired  attributes  to  the  Result  List.  Now 
only  the  conditions  needs  to  be  specified.  Using  connections,  attributes  of  the  selected  tables  and 
operators  provided  by  the  Tool  Box  the  query  can  easily  be  built  using  the  mouse.  In  the  Query  Rep¬ 
resentation  window  the  query  is  displayed  graphically.  Each  part  of  the  query  is  represented  by  a 
small  box,  simple  conditions  by  a  single,  subqueries  by  a  double  box,  and  the  connection  lines  are 
labeled  with  the  kind  of  connection  used.  An  advantage  is  that  every  part  of  the  query  can  be  ad¬ 
dressed  for  edit  or  delete  at  any  time  during  the  query  specification  process.  To  enhance  the  clarity 
of  display  parts  of  the  query  can  be  grouped  together  and  displayed  as  one  box  (zoom  in).  If  the 
user  wants  to  see  the  query  in  full  detail  at  a  later  stage  he  can  use  the  zoom  out  option. 

To  support  incremental  query  specification  we  allow  the  user  to  start  with  any  part  of  the  query 
and  combine  the  separate  parts  at  a  later  stage.  Additionally,  we  provide  an  option  to  save  and  re¬ 
load  any  part  of  the  query  for  later  use. 

Another  important  part  is  the  way  of  specifying  the  natural  language  description  part  of  a  query 
necessary  when  media  data  are  involved.  If  the  user  selects  a  media  attribute  in  the  specification  of 
the  condition,  automatically  a  special  description  editor  will  be  displayed  in  a  separate  window 
where  the  media  description  can  be  specified.  The  description  editor  has  special  features  including 
buttons  to  check  the  description,  present  the  hierarchy  for  a  word  and  enter  the  weight  of  the  dif¬ 
ferent  parts  of  the  description  needed  for  the  approximate  matching. 

To  further  explain  the  query  specification  process  we  will  consider  the  following  example: 
‘Select  the  name,  air  base  and  image  of  planes  which  can  carry  all  weapons  of  the  category 
air-to-air  and  where  the  image  shows  the  plane  attacking  a  hostile  plane’ . 

If  a  user  wants  to  specify  the  query  he  might  want  to  start  with  an  easy  part,  e.g.  ‘ weapons  of 
the  category  air-to-air’ .  To  specify  this  part  the  user  first  selects  Subquery  in  the  Tool  Box  provid¬ 
ing  him  a  second  double  box  for  his  subquery.  Then  he  selects  weapon  in  the  Tables  window.  As  a 
result  he  gets  all  attributes  of  the  weapon’s  table  in  a  separate  window  and  by  clicking  to  w_nr  he 
selects  the  desired  attribute.  The  next  step  is  to  specify  the  condition.  By  clicking  to  Cond  in  the 
Tool  Box  he  gets  an  empty  condition  box  in  the  Query  Representation  window  and  by  clicking  to 
the  attribute  category  in  the  weapon’s  window,  '=’  in  the  Tool  Box  and  typing  in  air-to-air  he  fills 
the  box  with  the  actual  condition. 
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Figure  7:  Description  Editor 


As  the  next  part  the  user  might  want  to  specify  the  image  description  condition  ‘image  shows 
the  plane  attacking  a  hostile  plane'.  The  specification  process  for  this  part  is  similar  to  the  speci¬ 
fication  of  the  first  part.  The  user  selects  the  plane  table  and  after  getting  a  new  condition  box  he 
selects  the  attribute  image  from  the  plane  window.  Because  image  is  a  media  table,  the  system  au¬ 
tomatically  provides  the  special  Description  Editor  window  (see  Figure  7).  In  this  window  the  user 
can  type  the  natural  language  description  for  the  image,  in  our  example  ‘Plane  attacks  a  hostile 
plane'.  When  clicking  the  Done  button  the  description  will  directly  be  interpreted  by  the  parser  to 
get  the  equivalent  predicates. 

The  last  step  is  to  specify  the  main  part  of  the  query  and  to  compose  the  parts  into  the  final  re¬ 
sult.  Starting  with  the  beginning  of  the  query  (‘ Select  name,  air  base  and  image’)  the  user  moves 
the  attributes  p  name,  air  base  and  image  to  the  Result  List  window.  By  selecting  Cond  from  Tool 
Box  and  plane  carry  weapon  from  the  connections  window  the  user  specifies  the  join  condition. 
Now  as  the  last  part  of  the  query  the  user  has  to  specify  the  all  condition.  This  can  be  accomplished 
by  getting  a  new  condition  box,  clicking  to  wjir  in  the  weapons  window,  '=’  and  ‘all'  in  the  Tool 
Box  and  the  double  box  representing  the  subquery  ‘weapons  of  the  category  air-to-air’  in  the  Que¬ 
ry  Representation  window.  The  last  step  is  to  combine  the  conditions  into  the  final  result.  This  is 
done  by  selecting  the  conditions  and  the  logical  operator  AND  from  the  Tool  Box.  In  Figure  8  the 
final  result  of  the  query  specification  process  is  shown. 

To  represent  the  results  we  choose  a  combined  form  and  list  oriented  approach.  Generally,  the 
results  are  presented  as  a  list.  Media  attributes  are  represented  as  buttons  allowing  to  access  the 
media  data.  By  clicking  to  a  row  of  the  list  a  single  tuple  can  be  obtained  in  a  form.  Figure  9  shows 
the  results  of  our  example  and  the  representation  of  one  tuple  in  a  customized  form. 

In  this  paper,  we  presented  only  a  small  part  of  our  graphical  user  interface.  The  data  definition, 
insert,  update  and  deletion  operation,  query  processing  and  optimization  issues,  predefined  joins, 
special  operators  and  their  semantics  are  far  beyond  the  scope  of  this  paper  and  will  be  presented 
in  a  later  paper  [KEIM91]. 
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7.  Summary 

A  major  problem  faced  in  a  multimedia  database  system  is  the  retrieval  of  multimedia  data  such 
as  a  sound  or  an  image.  Media  data  is  intrinsically  rich  in  semantics  and  conventional  search  meth¬ 
ods  used  in  databases  and  information  retrieval  systems  may  not  work  or  are  of  little  use.  Most  re¬ 
search  on  intelligent  IR  systems  are  concerned  with  natural  language  processing  and  deductive 
capabilities  based  on  extended  semantic  model  of  document  content  and  also  from  the  user.  How¬ 
ever,  most  of  them  deal  with  exact  matching  or  primitive  partial  matching  using  simple  linear  meth¬ 
ods.  Another  problem  faced  in  today’s  database  systems  is  the  lack  of  a  natural  way  to  specify 
complex  queries.  It  is  caused  by  the  gap  between  the  user’s  way  of  thinking  and  the  query  languag¬ 
es  used  in  most  systems.  Although  a  lot  of  work  has  been  done  in  the  area  of  user  interfaces  for 
database  systems  no  query  language  comes  close  to  the  natural  query  specification  process  used  by 
humans. 

In  this  paper,  we  discussed  these  fundamental  problems  and  outlined  the  architecture  of  our 
MDBMS  system.  One  contribution  of  our  paper  is  the  formulation  of  a  partial  matching  algorithm 
that  uses  domain  knowledge,  represented  using  an  object-oriented  data  model,  and  weight  ranking 
system  to  assign  weights  to  different  multimedia  data  stored  in  a  database  and  selects  those  multi- 
media  data  that  partially  matches  a  given  user  query  description.  Our  parser,  unlike  others,  provides 
an  interpretation  of  natural  language  descriptions  needed  to  achieve  an  intelligent  retrieval  of  mul¬ 
timedia  data.  Additionally,  it  provides  a  mechanisms  to  automatically  partition  a  user  query  into 
the  subject,  verb  and  object  components.  This  is  essential  in  that,  during  data  retrieval,  we  used  the 
partitioned  components  to  match  against  generalization  hierarchies  of  domain-dependent  knowl¬ 
edge  which  also  deals  with  subject,  noun  and  object  categories.  Further  research  is  necessary  to 
improve  the  parser  to  also  automatically  derive  adjectives  and  other  caption  components  for  com¬ 
plete  understanding  and  processing  of  captions  in  the  context  of  partial  matching. 

A  second  contribution  of  this  paper  is  our  graphical  user  interface.  It  shortens  the  gap  between 
the  user’s  way  of  thinking  and  formal  query  languages  by  using  graphical  user  interaction.  In  our 
system,  we  support  an  incremental  query  specification,  predefined  joins  and  special  operators  to 
make  the  query  specification  process  user  friendly.  The  user  is  guided  as  much  as  possible  allowing 
a  quick  and  almost  faultless  query  specification.  Further  research  is  necessary  to  come  even  closer 
to  the  user’s  way  of  query  specification  e.g.  by  allowing  the  user  to  directly  communicate  with  the 
system  in  natural  language. 

We  believe  that  our  system  provides  a  simple  and  elegant  approach  to  both  retrieval  of  multi- 
media  data  and  query  specification.  The  simplicity  of  our  retrieval  method  lies  in  exploiting  the  se¬ 
mantics  of  generalization  and  specialization  abstraction  of  the  object-oriented  model;  the 
simplicity  of  the  user  interface  lies  in  the  natural  way  of  query  specification  being  directly  obtained 
from  queries  expressed  in  natural  language.  We  also  believe  that  our  approaches  are  general  ones 
that  can  be  readily  applied  to  other  areas.  Our  retrieval  method  can  be  used  for  other  applications 
in  IR  and  AI  and  the  ideas  of  our  user  interface  can  be  applied  to  most  database  query  interfaces. 
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Figure  8:  Screen  after  Specifying  the  Query 


Figure  9:  Screen  with  the  Results  of  the  Query 
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